A Multistrategy Data Mining Approach to Classification

نویسندگان

  • Mordechai Gal-Or
  • William E. Spangler
  • Jerrold H. May
چکیده

Our research explores the use of ensemble, or multistrategy learning techniques for inducing and managing patterns of knowledge from organizational data. Specifically, we are exploring the use of data mining techniques in building an ensemble classification system – i.e., a system that incorporates multiple machine learning techniques to generate multiple models from existing data and make predictions about new observations. Our research is inspired and motivated by a real-world business problem. The emergence of the digital personal video recorder (PVR) is expected, over time, to cause profound changes in television viewing, as viewers use the new technology to time-shift viewing and skim over or eliminate ‘in stream’ commercials. This trend is a significant threat to television advertisers and service providers, because it jeopardizes the traditional means by which advertising finances so-called ‘free’ programming. Although a number of modeling methods are potentially useful for the analysis of television viewing data and the classification of specific viewer types, because of the complexity of the domain we cannot know a priori which methods will be most accurate in specific situations. The effectiveness of a particular method is dependent on a number of factors, including the characteristics of the viewer, the prevalence of target viewers in the overall population, the specific viewer attributes to be predicted, asymmetry of misclassification costs, and other characteristics of the viewing data – including types of programs viewed, time of day, and so on. Because it is unlikely that any single method could perform optimally under these circumstances, we are developing an ensemble classifier composed of a number of different analytic methods. This classifier would process various television viewing data sets against each of the methods, and attempt to construct a single prediction about the viewer from the collective predictions of the various methods. We have conducted preliminary analyses of viewer data obtained from Nielsen Media Services, Inc. (NMSI), and developed an initial prototype of the data mining component from those analyses. Our initial study of viewing behavior for five target gender/age segments suggests that gains in performance are possible even with simple democratic voting schemes – i.e., where each method has a single vote. Our goal now is to determine whether we can do better by using more sophisticated combination strategies. We intend to approach the problem in two phases. The first phase will explore the combination of multiple methods in a controlled experiment using simulated data, while the second will apply lessons learned from the controlled experiment to the analysis of actual television viewing data obtained from NMSI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Methodology and Life Cycle Model for Data Mining and Knowledge Discovery in Precision Agriculture

This paper presents a methodology for data mining and knowledge discovery in large, distributed and heterogeneous databases. In order to obtain potentially interesting patterns, relationships, and rules from such large and heterogeneous data collections, it is essential that a methodology be developed to take advantage of the suite of existing methods and tools available for data mining and kno...

متن کامل

Customer Retention Based on the Number of Purchase: A Data Mining Approach

Purpose: this study wants to find any relationship between the numbers of purchase and the income the customer brings to the company. The attempt is to find those customers who buy more than one life insurance policy and represent the signs of good payments at the same time by the help of data mining tools. Design/ methodology/ approach: the approach of this research is to use data mining tools...

متن کامل

Using Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach

Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...

متن کامل

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

AqBC: A Multistrategy Approach for Constructive Induction

In order to obtain potentially interesting patterns and relations from large, distributed, heterogeneous databases, it is essential to employ an intelligent and automated KDD (Knowledge Discovery in Databases) process. One of the most important methodologies is an integration of diverse learning strategies that cooperatively performs a variety of techniques and achieves high quality knowledge. ...

متن کامل

Credit scoring in banks and financial institutions via data mining techniques: A literature review

This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003